-
Notifications
You must be signed in to change notification settings - Fork 5k
feat: custom sharding weights #38624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
pavelfeldman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm module cli flag.
| - type: <[null]|[Object]> | ||
| - `total` <[int]> The total number of shards. | ||
| - `current` <[int]> The index of the shard to execute, one-based. | ||
| - `weights` ?<[Array]<[int]>> The shard weights. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not expose them for now.
docs/src/test-sharding-js.md
Outdated
| ## Rebalancing Shards | ||
|
|
||
| ``` | ||
| npx playwright test --shard=1/4,26/24/25/25 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say --shard=1/4 --shard-weights=26:24:25:25 to keep things readable
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Test results for "MCP"2727 passed, 116 skipped Merge workflow run. |
Test results for "tests 1"1 failed 1 flaky34396 passed, 689 skipped Merge workflow run. |
|
@Skn0tt what release will include these changes? |
|
We're planning for 1.58 to include this. |
|
I don't see how this solves the distribution problem raised in the various referenced issues. Consider an unbalanced suite with 1000 tests that take 1s each (~17 minutes total) and two tests that each take 20 minutes, but are unluckily the last tests in the list. I don't think I can use weights to evenly balance this into 2 shards. If the the shards are weighted to split up the two 20 minute tests, the first shard will be 37 minutes and second 20 minutes. If the two are in the same shard, the first shard will be at most 17 minutes and the 2nd will be at least 40 minutes. So, this is not actually a viable alternative to #38588 |
|
+1 to the above – I'm not convinced this really solves the problem, or if it does, only in a way which doesn't seem very user friendly. If I'm not mistaken, this just means you can control how many tests are assigned to each shard? So you need to evaluate ahead of time how the tests will bucket, right? It won't affect the ordering of them. If you have 6 heavy tests in direct sequence (e.g. 10 minutes each) while most the rest of the 300 tests take seconds, it doesn't seem like there is any conceivable way to come close to balancing them with this implementation, except having nearly 300 shards. I'm not sure I understand the scenarios that this is solving for though, so maybe this is my own misunderstanding or lack of imagination. Doesn't this only really address cases where the tests have a sort of normal distribution of length? So you'd compute (how I'm not sure) where to redraw the boundaries to rebalance them? My situation, which I feel from the above comment from @gpaciga is not unique, is we have a roughly even distribution of runtimes, with a small number of clustered extreme outliers. If I have 6 outliers in a suite of 300 tests, with 6 shards, the optimal distribution is that none of these six run in the same shards as any of the other (or close to it). But this weighting wouldn't appear to make this possible – even if it did, wouldn't new tests distort the distribution? |
|
@Skn0tt great to see sharding getting some attention! However, I wonder what's the reason this sharding weights approach is considered as an alternative to sharding based on duration / timing data? |
|
That's good feedback, thanks. We were thinking that weights are easier to maintain than duration data, but you're right that for uneven scenarios this doesn't solve the ordering problem. Whereas something like greedy binpacking based on duration data would. I'll take this feedback back to the drawing board. |
|
@Skn0tt the shard weights how they are implemented in this PR benefit when your shards always run on the same machines and they have different performance specs. E.g. shard 1/4 runs on a high performing machine and therefor the shard can take more weight than the other shards. However, if you utilize cloud infrastructure to run your tests it's very likely that all shards run on machines that have equal performance specs. It would be great if you could have different weight per test and the sharding automatically distributes tests based on these per test weights... wdyt? |
|
I see how shard weights can help balance across different machine types, but shards typically all use the same specs. Weights can still help, because sharding is stable. The same test always lands in the same shard, and if a significant number of expensive tests happens to land in one shard, that shard will take longer. That's what's happening in our own tests: We have roughly 40k tests across different bots and 2 shards, and we found that shard 2 alwasy took longer because those tests are just slightly longer, and it compounds at scale. So attributing more weight to shard 1 allows us to iron out the disbalance, see #38635 for numbers. This works best in large test suites where the disbalance is compounding, and the number of tests is stable. It's not as effective in the case outlined above, where there's a small number of disproportionally expensive tests that all land in the same shard. #38588 is the alternative that solves both usecases, but we were afraid of the usability of committing duration stats into the repo. We'll reevaluate. |
|
@Skn0tt There is no need to commit the durations to the repository. We're also not doing that. They just serve as a way to optimize overall CI time. In our setup (still using the changes of #30962, since 2yrs already) we have the duration / timing data in the |
|
I see! What CI runner are you using, and how are you restoring the file across builds? |
|
We're using Jenkins Pipeline. The Archive step allows us to make the last-run.json available to other pipelines... however, I'm pretty sure you can use a post-build step in any other CI solution to store your test durations of the last successful main build in some place that you can then use for the next build. E.g. you could even post it as a gist or upload to an AWS S3 bucket... |
|
FWIW I've been using the same approach as @muhqu for approx 6 months, on Github actions, archiving |
Inspired by #38598, alternative to #38588. Let's play around with it and see how it works for our own CI.